This lab aims to guide you through downloading American Community Survey (ACS) data, specifically focusing on income at the census tract level. By the end of this lab, you will be able to:
Access and download ACS data for one or multiple states.
Process and clean the data.
Perform basic analysis on income data.
Before starting, make sure you have the following packages installed.
#install.packages(“tidyverse”)
#install.packages(“tidycensus”)
#install.packages(“sf”)
#install.packages(“tigris”)
#install.packages(“mapview”)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.3 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tidycensus)
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(mapview)
You need a Census API key to access ACS data. If you don’t have one, you can request it here at https://api.census.gov/data/key_signup.html Once you have your key, set it up in R.
#please replace the API Key
#census_api_key("e23a9a88f3a3911be51aed1a0e9c595a10e35b59", install = TRUE)
census_api_key("e23a9a88f3a3911be51aed1a0e9c595a10e35b59", overwrite = TRUE)
## To install your API key for use in future sessions, run this function with `install = TRUE`.
readRenviron("~/.Renviron")
Use the tidycensus package to download ACS data. We’ll focus on median household income and education attainment levels for Virginia.
# Define variables for median household income
variables <- c(income = "B19013_001")
# Download data for Virginia
acs_data <- get_acs(
geography = "tract",
variables = variables,
state = "VA",
year = 2020,
survey = "acs5",
geometry = TRUE
)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
##
|
| | 0%
|
|= | 1%
|
|= | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|==== | 5%
|
|==== | 6%
|
|===== | 7%
|
|====== | 8%
|
|====== | 9%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|========= | 12%
|
|========= | 13%
|
|========== | 14%
|
|========== | 15%
|
|=========== | 16%
|
|============ | 17%
|
|============ | 18%
|
|============= | 18%
|
|============== | 19%
|
|============== | 20%
|
|=============== | 21%
|
|=============== | 22%
|
|================ | 23%
|
|================= | 24%
|
|================= | 25%
|
|================== | 26%
|
|=================== | 27%
|
|==================== | 28%
|
|==================== | 29%
|
|===================== | 30%
|
|====================== | 31%
|
|====================== | 32%
|
|======================= | 33%
|
|======================== | 34%
|
|========================= | 35%
|
|========================= | 36%
|
|========================== | 37%
|
|========================== | 38%
|
|=========================== | 39%
|
|============================ | 40%
|
|============================= | 41%
|
|============================== | 42%
|
|============================== | 43%
|
|=============================== | 44%
|
|=============================== | 45%
|
|================================ | 46%
|
|======================================= | 55%
|
|============================================== | 65%
|
|==================================================== | 75%
|
|=========================================================== | 85%
|
|============================================================ | 85%
|
|============================================================ | 86%
|
|============================================================= | 87%
|
|============================================================== | 88%
|
|============================================================== | 89%
|
|=============================================================== | 90%
|
|=============================================================== | 91%
|
|================================================================ | 92%
|
|================================================================= | 92%
|
|================================================================= | 93%
|
|================================================================== | 94%
|
|=================================================================== | 95%
|
|=================================================================== | 96%
|
|==================================================================== | 97%
|
|==================================================================== | 98%
|
|===================================================================== | 99%
|
|======================================================================| 99%
|
|======================================================================| 100%
# View the first few rows of the data
head(acs_data)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -78.78807 ymin: 37.51485 xmax: -77.18975 ymax: 38.89167
## Geodetic CRS: NAD83
## GEOID NAME variable estimate
## 1 51087200409 Census Tract 2004.09, Henrico County, Virginia income 54940
## 2 51760021000 Census Tract 210, Richmond city, Virginia income NA
## 3 51003010100 Census Tract 101, Albemarle County, Virginia income 87000
## 4 51059471401 Census Tract 4714.01, Fairfax County, Virginia income 116683
## 5 51059432500 Census Tract 4325, Fairfax County, Virginia income 157426
## 6 51059492300 Census Tract 4923, Fairfax County, Virginia income 112305
## moe geometry
## 1 16759 MULTIPOLYGON (((-77.53545 3...
## 2 NA MULTIPOLYGON (((-77.40362 3...
## 3 24390 MULTIPOLYGON (((-78.78807 3...
## 4 21233 MULTIPOLYGON (((-77.20789 3...
## 5 14290 MULTIPOLYGON (((-77.26601 3...
## 6 23718 MULTIPOLYGON (((-77.25242 3...
colnames(acs_data) <- c("GEOID", "NAME", "variable","Median_Income", "moe","geometry")
Perform basic analysis to explore income levels.
# Summary statistics for median income
summary(acs_data$Median_Income)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2499 52132 72256 84248 106560 250001 46
# Plot median income distribution
ggplot(acs_data, aes(x = Median_Income)) +
geom_histogram(binwidth = 5000, fill = "blue", color = "black") +
labs(title = "Distribution of Median Household Income in Virginia",
x = "Median Household Income",
y = "Frequency")
## Warning: Removed 46 rows containing non-finite outside the scale range
## (`stat_bin()`).
Visualize the data on a map.
# Plot median income on a map
ggplot(acs_data) +
geom_sf(aes(fill = Median_Income)) +
scale_fill_viridis_c() +
labs(title = "Median Household Income by Census Tract in Virginia",
fill = "Median Income")
## Step 7: Download ACS/census data for multiple states
acs_data <- get_acs(geography = "tract",
variables = "B19013_001",
state = c("DC", "MD", "VA"),
geometry = TRUE)
## Getting data from the 2016-2020 5-year ACS
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
## Fetching tract data by state and combining the result.
##
|
| | 0%
|
|======================= | 33%
|
|============================================== | 66%
|
|======================================================================| 100%
##
|
| | 0%
|
|= | 2%
|
|=== | 4%
|
|==== | 6%
|
|===== | 7%
|
|======= | 9%
|
|======== | 11%
|
|========= | 13%
|
|========== | 15%
|
|============ | 17%
|
|============= | 19%
|
|============== | 21%
|
|================ | 22%
|
|================= | 24%
|
|================== | 26%
|
|==================== | 28%
|
|===================== | 29%
|
|====================== | 31%
|
|======================= | 33%
|
|========================= | 35%
|
|========================== | 37%
|
|=========================== | 39%
|
|============================ | 41%
|
|============================== | 43%
|
|=============================== | 44%
|
|================================ | 46%
|
|================================== | 48%
|
|=================================== | 50%
|
|==================================== | 52%
|
|====================================== | 54%
|
|======================================= | 56%
|
|======================================== | 58%
|
|========================================= | 59%
|
|=========================================== | 61%
|
|============================================ | 63%
|
|============================================= | 65%
|
|=============================================== | 67%
|
|================================================ | 69%
|
|================================================= | 70%
|
|=================================================== | 72%
|
|==================================================== | 74%
|
|===================================================== | 76%
|
|======================================================= | 78%
|
|======================================================== | 80%
|
|========================================================= | 82%
|
|=========================================================== | 84%
|
|============================================================ | 85%
|
|============================================================= | 87%
|
|============================================================== | 89%
|
|================================================================ | 91%
|
|================================================================= | 93%
|
|================================================================== | 95%
|
|==================================================================== | 97%
|
|===================================================================== | 98%
|
|======================================================================| 100%
colnames(acs_data) <- c("GEOID", "NAME", "variable","Median_Income", "moe","geometry")
mapview(acs_data,z='Median_Income')
Now that we have income data for DC, VA, and MD, let’s focus on the DMV (District of Columbia, Maryland, and Virginia) area for some basic analysis. Specifically, we will quantify inequality using the Gini index. The Gini index is a measure of statistical dispersion intended to represent the income inequality within a region. A Gini index of 0 represents perfect equality, while an index of 1 indicates maximal inequality.
dmv<-st_read('dmv.shp')
## Reading layer `dmv' from data source
## `/Users/yshao/work/Geog4254-5254G/lab4/dmv.shp' using driver `ESRI Shapefile'
## Simple feature collection with 1 feature and 7 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -78.45 ymin: 37.99 xmax: -76.38 ymax: 39.72
## Geodetic CRS: WGS 84
plot(dmv)
## Step 2: Select tracts by location (dmv boundary)
# Transform the CRS of the ACS data to match the DMV shapefile
acs_data <- st_transform(acs_data, st_crs(dmv))
# Perform the spatial subset
dmv_data <- acs_data[dmv, ,op = st_intersects]
mapview(dmv_data,z='Median_Income')
library(ineq)
gini_index <- Gini(dmv_data$Median_Income)
# Print the Gini index
print(gini_index)
## [1] 0.2261062
Save the processed data as a shapefile.
st_write(dmv_data, "acs_data_dmv.shp", delete_layer = TRUE)
## Warning in abbreviate_shapefile_names(obj): Field names abbreviated for ESRI
## Shapefile driver
## Writing layer `acs_data_dmv' to data source
## `acs_data_dmv.shp' using driver `ESRI Shapefile'
## Writing 1546 features with 5 fields and geometry type Multi Polygon.
Identify and Report the Tract with the Highest Median Income in DMV
Identify and Report the Tract with the Lowest Median Income in DMV
Calculate the Gini index for the Richmond, VA Metro Area. The spatial boundary of the Richmond Metro Area [Richmond_metro.shp] is included in the Lab 4 folder. You will need to revise the script, particularly the section on selecting by location, as well as parts of the lab steps to obtain the answer.
Create a map visualizing the median household income for each tract within the Richmond VA Metro area using the tmap package. Attach the resulting map.